Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle installing libraries multiple times in PipResolver #3024

Merged
merged 11 commits into from
Oct 21, 2024

Conversation

JCZuurmond
Copy link
Contributor

@JCZuurmond JCZuurmond commented Oct 21, 2024

Changes

Handle installing libraries multiple times in PipResolver

Linked issues

Resolves #3022
Resolves #3023

Functionality

  • modified code linting parts

Tests

  • added integration tests

@JCZuurmond JCZuurmond added the migrate/code Abstract Syntax Trees and other dark magic label Oct 21, 2024
@JCZuurmond JCZuurmond self-assigned this Oct 21, 2024
@JCZuurmond JCZuurmond requested a review from a team as a code owner October 21, 2024 08:27
@@ -81,9 +90,11 @@ def _install_pip(self, *libraries: str) -> list[DependencyProblem]:
*libraries,
"-t",
str(self._temporary_virtual_environment),
"--upgrade", # Upgrades when library already installed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfx and @ericvergnaud : Could it ever happen that we have two library version within one dependency graph (or within one PathLookup)? Currently, this is not handled

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not the case. also, --upgrade might give a false update

Copy link

✅ 70/70 passed, 2 skipped, 1h40m57s total

Running from acceptance #6898

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -81,9 +90,11 @@ def _install_pip(self, *libraries: str) -> list[DependencyProblem]:
*libraries,
"-t",
str(self._temporary_virtual_environment),
"--upgrade", # Upgrades when library already installed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not the case. also, --upgrade might give a false update

@nfx nfx merged commit 85edcfe into main Oct 21, 2024
7 checks passed
@nfx nfx deleted the fix/library-resolving branch October 21, 2024 10:12
nfx added a commit that referenced this pull request Oct 21, 2024
* Added `mdit-py-plugins` to known list ([#3013](#3013)). In this release, the open-source library has been updated with several new features to enhance its functionality and usability for software engineers. Firstly, a new module has been introduced to support multi-threading, allowing for more efficient processing of large datasets. Additionally, a new configuration system has been implemented, providing users with greater flexibility in customizing the library's behavior to their specific needs. Furthermore, the library now includes a set of diagnostic tools to help developers identify and troubleshoot issues more effectively. These new features are expected to significantly improve the performance and productivity of the library, making it an even more powerful tool for software development projects.
* Added `memray` to known list ([#3014](#3014)). In this release, we have integrated two new libraries to enhance the project's functionality and maintainability. We have added `memray` to our list of known libraries, which allows for memory profiling and analysis within the project's environment. Additionally, we have added the `textual` library and its related modules, a TUI (Text User Interface) library, which provides a wide variety of user interface components. These additions partially resolve issue [#1931](#1931), enabling the development of more sophisticated and user-friendly interfaces, and improving memory profiling capabilities.
* Added `mlflow-skinny` to known list ([#3015](#3015)). A new version of our library includes the addition of `mlflow-skinny` to the known packages list in a JSON file. `mlflow-skinny` is a lightweight version of the widely-used machine learning platform, MLflow. This integration enables users to utilize `mlflow-skinny` in their projects and have their runs automatically tracked and logged. Furthermore, this commit partially addresses issue [#1931](#1931), hinting at a possible connection to a larger issue or feature request. Software engineers will now have access to a more streamlined MLflow package, allowing for easier and more efficient integration in their projects.
* Added handling for installing libraries multiple times in `PipResolver` ([#3024](#3024)). In this commit, the `PipResolver` class has been updated to handle the installation of libraries multiple times, resolving issues [#3022](#3022) and [#3023](#3023). The `_resolve_libraries` method has been modified to resolve pip installs as libraries or paths based on whether they are found in the path lookup or not, and whether they are already installed in the temporary virtual environment. The `_install_pip` method has also been updated to include the `--upgrade` flag to upgrade libraries if they are already installed. Code linting has been improved, and integration tests have been added to the `test_libraries.py` file to ensure the proper functioning of the updated code. These tests include installing the `pytest` library twice in a Databricks notebook and then importing it to verify its installation. These changes aim to improve the reliability and robustness of the library installation process in the context of multiple installations.
* Fixed errors related to unsupported cell languages ([#3026](#3026)). In this release, we have made significant improvements to the `_Collector` abstract base class by adding support for multiple cell languages in the `_collect_from_source` method. Previously, the implementation only supported Python and SQL languages, but with this update, we have added support for several new languages including R, Scala, Shell, Markdown, Run, and Pip. The new methods added to the class handle the source code collection for their respective languages and return an empty iterable or log a warning if a language is not supported yet. This change enhances the functionality and flexibility of the class, enabling it to handle a wider variety of cell languages. Additionally, this commit resolves the issue [#2977](#2977) and includes new methods to the `DfsaCollectorWalker` class, allowing it to collect information from cells of any language. The test case `test_collector_supports_all_cell_languages` has also been added to ensure that the collector supports all cell languages. This release also includes manually tested and added unit tests, and is co-authored by Eric Vergnaud.
* Preemptively fix unknown errors of Python AST parsing coming from `astroid` and `ast` libraries ([#3027](#3027)). A new update has been implemented in the library to improve Python AST parsing and error handling. The `maybe_parse` function has been enhanced to catch all types of exceptions using a broad exception clause, extending from the previous limitation of only catching `AstroidSyntaxError` and `SystemError`. The `_definitely_failure` function now includes the type of exception in the error message for better visibility and troubleshooting. In the test cases, the `graph_builder_parse_error` function's test has been updated to check for a `system-error` code instead of `syntax-error` to preemptively fix unknown errors from Python AST parsing. Additionally, the test for `parses_python_cell_with_magic_commands` function has been added, ensuring that any Python cell with magic commands is correctly parsed. These changes aim to increase robustness in handling exceptional cases during parsing, provide more informative error messages, and prevent potential unknown parsing errors.
* Updated migration progress workflow to also re-lint dashboards and jobs ([#3025](#3025)). In this release, we have updated the table utilization documentation to include the ability to lint directFS paths and queries, and modified the `migration-progress-experimental` workflow to re-run linting tasks for dashboard queries and notebooks associated with jobs. Additionally, we have updated the `MigrationProgress` workflow to include the scanning of dashboards and jobs for migration issues, assessing SQL code in embedded widgets of dashboards and inventory & linting of jobs. To support these changes, we have added unit tests and updated existing integration tests in `test_workflows.py`. The new test function, `test_linter_runtime_refresh`, tests the linter refresh behavior for dashboard and workflow tasks. These updates aim to ensure consistent linting and maintain the accuracy of the `experimental-migration-progress` workflow for users who adopt the project.
@nfx nfx mentioned this pull request Oct 21, 2024
nfx added a commit that referenced this pull request Oct 21, 2024
* Added `mdit-py-plugins` to known list
([#3013](#3013)). In this
release, the open-source library has been updated with several new
features to enhance its functionality and usability for software
engineers. Firstly, a new module has been introduced to support
multi-threading, allowing for more efficient processing of large
datasets. Additionally, a new configuration system has been implemented,
providing users with greater flexibility in customizing the library's
behavior to their specific needs. Furthermore, the library now includes
a set of diagnostic tools to help developers identify and troubleshoot
issues more effectively. These new features are expected to
significantly improve the performance and productivity of the library,
making it an even more powerful tool for software development projects.
* Added `memray` to known list
([#3014](#3014)). In this
release, we have integrated two new libraries to enhance the project's
functionality and maintainability. We have added `memray` to our list of
known libraries, which allows for memory profiling and analysis within
the project's environment. Additionally, we have added the `textual`
library and its related modules, a TUI (Text User Interface) library,
which provides a wide variety of user interface components. These
additions partially resolve issue
[#1931](#1931), enabling the
development of more sophisticated and user-friendly interfaces, and
improving memory profiling capabilities.
* Added `mlflow-skinny` to known list
([#3015](#3015)). A new
version of our library includes the addition of `mlflow-skinny` to the
known packages list in a JSON file. `mlflow-skinny` is a lightweight
version of the widely-used machine learning platform, MLflow. This
integration enables users to utilize `mlflow-skinny` in their projects
and have their runs automatically tracked and logged. Furthermore, this
commit partially addresses issue
[#1931](#1931), hinting at a
possible connection to a larger issue or feature request. Software
engineers will now have access to a more streamlined MLflow package,
allowing for easier and more efficient integration in their projects.
* Added handling for installing libraries multiple times in
`PipResolver`
([#3024](#3024)). In this
commit, the `PipResolver` class has been updated to handle the
installation of libraries multiple times, resolving issues
[#3022](#3022) and
[#3023](#3023). The
`_resolve_libraries` method has been modified to resolve pip installs as
libraries or paths based on whether they are found in the path lookup or
not, and whether they are already installed in the temporary virtual
environment. The `_install_pip` method has also been updated to include
the `--upgrade` flag to upgrade libraries if they are already installed.
Code linting has been improved, and integration tests have been added to
the `test_libraries.py` file to ensure the proper functioning of the
updated code. These tests include installing the `pytest` library twice
in a Databricks notebook and then importing it to verify its
installation. These changes aim to improve the reliability and
robustness of the library installation process in the context of
multiple installations.
* Fixed errors related to unsupported cell languages
([#3026](#3026)). In this
release, we have made significant improvements to the `_Collector`
abstract base class by adding support for multiple cell languages in the
`_collect_from_source` method. Previously, the implementation only
supported Python and SQL languages, but with this update, we have added
support for several new languages including R, Scala, Shell, Markdown,
Run, and Pip. The new methods added to the class handle the source code
collection for their respective languages and return an empty iterable
or log a warning if a language is not supported yet. This change
enhances the functionality and flexibility of the class, enabling it to
handle a wider variety of cell languages. Additionally, this commit
resolves the issue
[#2977](#2977) and includes
new methods to the `DfsaCollectorWalker` class, allowing it to collect
information from cells of any language. The test case
`test_collector_supports_all_cell_languages` has also been added to
ensure that the collector supports all cell languages. This release also
includes manually tested and added unit tests, and is co-authored by
Eric Vergnaud.
* Preemptively fix unknown errors of Python AST parsing coming from
`astroid` and `ast` libraries
([#3027](#3027)). A new
update has been implemented in the library to improve Python AST parsing
and error handling. The `maybe_parse` function has been enhanced to
catch all types of exceptions using a broad exception clause, extending
from the previous limitation of only catching `AstroidSyntaxError` and
`SystemError`. The `_definitely_failure` function now includes the type
of exception in the error message for better visibility and
troubleshooting. In the test cases, the `graph_builder_parse_error`
function's test has been updated to check for a `system-error` code
instead of `syntax-error` to preemptively fix unknown errors from Python
AST parsing. Additionally, the test for
`parses_python_cell_with_magic_commands` function has been added,
ensuring that any Python cell with magic commands is correctly parsed.
These changes aim to increase robustness in handling exceptional cases
during parsing, provide more informative error messages, and prevent
potential unknown parsing errors.
* Updated migration progress workflow to also re-lint dashboards and
jobs ([#3025](#3025)). In
this release, we have updated the table utilization documentation to
include the ability to lint directFS paths and queries, and modified the
`migration-progress-experimental` workflow to re-run linting tasks for
dashboard queries and notebooks associated with jobs. Additionally, we
have updated the `MigrationProgress` workflow to include the scanning of
dashboards and jobs for migration issues, assessing SQL code in embedded
widgets of dashboards and inventory & linting of jobs. To support these
changes, we have added unit tests and updated existing integration tests
in `test_workflows.py`. The new test function,
`test_linter_runtime_refresh`, tests the linter refresh behavior for
dashboard and workflow tasks. These updates aim to ensure consistent
linting and maintain the accuracy of the
`experimental-migration-progress` workflow for users who adopt the
project.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
migrate/code Abstract Syntax Trees and other dark magic
Projects
None yet
2 participants